Zer0 inflated Models for Inpatient Admissions

Published

October 22, 2025

We consider Unicare - Study vs all other members, using cohort, as well as select other covariates. Even in the event ignorability, balance, and other assumptions, they often result in reduced bias and efficiency (lower variance) of estimates.

Regression Models

  • Negative binomial regression; intercept only and with cohort as covariate

  • Zero inflated Poisson and negative binomial; cohort and select other covariates on count and zero inflation portions of the model

  • Bayesian versions of the above

Modeling Hospital Admissions counts

We consider admissions occurring within 270 day window of observation.

cohort count
control 1051
ibis 117

Admissions counts

Overall

mean variance
0.512 1.39

By cohort

cohort mean variance
control 0.549 1.511
ibis 0.179 0.218

With overall variance not equal to the mean, a Poisson model is not appropriate. We consider a negative binomial model, which allows for overdispersion. But note that the dispersion parameter is a global parameter, and not adjusted for cohorts separately. That said, as it is a function of the mean, the variance given by the model will be adjusted for cohort.

Covariates, multicolinearity

Perhaps surprisingly, age and chronic conditions covariates do not exhibit high multicolinearity. This means not only that there are no high pairwise correlations, but that, for example, any one of them is not explainable by the others- regressing any one on the others would results in a low R squared value.

As an illustration, with condition count included, the covariates are completely colinear.

Low multicolinearity is a desirable feature for generalized linear models as it results in more stable coefficient estimates.

Condition number

We find the condition number of the model matrix with age and other covariates. This the square root of the ratio of max and min eigen values of the correlation matrix, and is thus a ratio of the standard deviations along components of the data with max variation and min variation,

\(\sqrt{\frac{\lambda_{max}}{\lambda_{min}}}\) = 2.19

As a rule of thumb, condition number > 30 is indication of high multicolinearity.

Correlations

Selecting covariates with low multicolinearity is also more straightforward, as simply selecting those most highly correlated with the outcome is more justifiable than in the case of high multicolinearity.

condition correlation
atrial_fibrillation 0.2337
age 0.1525
chf 0.1354
copd 0.0793
lung_cancer 0.0770
kidney_disease 0.0651
urologic_cancer 0.0635
anemia 0.0570
chd 0.0539
dementia 0.0506
endometrial_cancer 0.0502
parkinsons 0.0461
alzheimers 0.0415
hip_fracture 0.0316
pneumonia 0.0223
heart_attack 0.0192
hypothyroidism 0.0179
mild_cognitive_impairment 0.0091
hypertension 0.0070
stroke 0.0033
hyperplasia 0.0027
diabetes -0.0031
glaucoma -0.0108
asthma -0.0150
breast_cancer -0.0157
cataract -0.0218
anxiety -0.0242
osteoporosis -0.0249
depression -0.0263
arthritis -0.0385
prostate_cancer -0.0440
hyperlipidemia -0.0493
colorectal_cancer -0.0582

We select age, atrial_fibrillation, and chf, variously in the models below.

Admissions counts; negative binomial

We first compare intercept only and cohort models

\[ \begin{aligned} {\rm{count}}_i & \sim NB(\mu_i, \theta) \\ \log \mu_i & = \beta_0 \end{aligned} \]

term estimate p.value exp_estimate
(Intercept) -0.669 0 0.512

\(\theta\): 0.319

\[ \begin{aligned} {\rm{count}}_i & \sim NB(\mu_i, \theta) \\ \log \mu_i & = \beta_0 + \beta_1 \textrm{cohort}_i \end{aligned} \]

term estimate p.value exp_estimate
(Intercept) -0.60 0 0.549
cohortibis -1.12 0 0.327

\(\theta\): 0.336

Model comparison

Negative binomial with vs without cohort as predictor


Likelihood ratio test, ANOVA, p-value on coefficient all equivalent

term estimate std.error statistic p.value
(Intercept) -0.60 0.068 -8.87 0
cohortibis -1.12 0.279 -4.01 0

Predictive check, NB model for admissions counts

Model

\[\textrm{counts}_i \sim NB(\mu_i, \theta)\] \[\log \mu_i = \beta_0 + \beta_1 \textrm{cohort}\]

Observed vs predicted admissions count proportions; negative binomial

for the control cohort

inpatient_count frequency prevalence expected prob
0 761 0.725 757.699 0.722
1 151 0.144 157.845 0.150
2 71 0.068 65.416 0.062
3 35 0.033 31.605 0.030
4 19 0.018 16.355 0.016
5 4 0.004 8.801 0.008
6 4 0.004 4.857 0.005
7 1 0.001 2.728 0.003
8 0 0.000 1.552 0.001
9 3 0.003 0.892 0.001
10 0 0.000 0.517 0.000

For the ibis cohort

inpatient_count frequency prevalence expected prob
0 100 0.855 101.331 0.866
1 13 0.111 11.851 0.101
2 4 0.034 2.757 0.024
3 0 0.000 0.748 0.006
4 0 0.000 0.217 0.002
5 0 0.000 0.066 0.001
6 0 0.000 0.020 0.000
7 0 0.000 0.006 0.000
8 0 0.000 0.002 0.000
9 0 0.000 0.001 0.000
10 0 0.000 0.000 0.000

Zero inflated negative binomial

With cohort as covariate on both zero inflation and count portions of the model

This example for illustration only

\[ \begin{aligned} \textrm{counts}_i &\sim \text{ZINB}(\pi_i, \mu_i, \theta) \\ \log\left( \frac{\pi_i}{1 - \pi_i} \right) &= \gamma_0 + \gamma_1 \, \textrm{cohort}_i \\ \log(\mu_i) &= \beta_0 + \beta_1 \, \textrm{cohort}_i \end{aligned} \]

term estimate
count_(Intercept) -0.335
count_cohortibis -1.382
zero_(Intercept) -1.193
zero_cohortibis -6.505

\(\theta\): 0.505

This example for illustration only The coeffs are not significant, but see models below.

Zero inflation probabilities

  • control: \(logit^{-1}\) -1.193 = 0.233

  • ibis: \(logit^{-1}\) (-1.193 + -6.505) = 4.532^{-4}

Count mean for ibis also reduced by factor of exp(-1.382) = 0.251.

Observed vs expected admissions counts; zero inflated negative binomial

for the control cohort

inpatient_count frequency prevalence expected prob
0 761 0.725 759.552 0.724
1 151 0.144 152.621 0.145
2 71 0.068 67.325 0.064
3 35 0.033 32.954 0.031
4 19 0.018 16.927 0.016
5 4 0.004 8.940 0.009
6 4 0.004 4.808 0.005
7 1 0.001 2.619 0.002
8 0 0.000 1.440 0.001
9 3 0.003 0.798 0.001
10 0 0.000 0.445 0.000

For the ibis cohort

inpatient_count frequency prevalence expected prob
0 100 0.855 100.345 0.858
1 13 0.111 13.286 0.114
2 4 0.034 2.622 0.022
3 0 0.000 0.574 0.005
4 0 0.000 0.132 0.001
5 0 0.000 0.031 0.000
6 0 0.000 0.008 0.000
7 0 0.000 0.002 0.000
8 0 0.000 0.000 0.000
9 0 0.000 0.000 0.000
10 0 0.000 0.000 0.000

More models; summary and comparisons

Below we fit more models and summarize eight models. All the models use cohort, chf, and atrial_fibrillation as covariates.

  • zero inflated vs not

  • poisson vs negative binomial distribution

  • with and without age as a covariate.

Zero inflated age

Two of the models include age as a covariate for the zero inflation part of the model, as well as cohort, chf, and atrial_fibrillation on the count portions. We highlight those here. The full summary of all the models follows.

Zero inflated Poisson with age zero inflation

term component estimate p.value
(Intercept) count 1.3113 0.0001
cohortibis count -1.1633 0.0000
age count -0.0145 0.0019
chf count 0.1838 0.1292
atrial_fibrillation count 0.3695 0.0006
(Intercept) zero 3.8854 0.0000
age zero -0.0465 0.0000

Zero inflated negative binomial with age zero inflation

term component estimate p.value
(Intercept) count 0.5769 0.2944
cohortibis count -1.1460 0.0000
age count -0.0136 0.0463
chf count 0.3708 0.0424
atrial_fibrillation count 0.6520 0.0000
Log(theta) count -0.2263 0.4097
(Intercept) zero 4.0004 0.0000
age zero -0.0655 0.0000

Coefficients for all models

Poisson and negative binomial; no zero inflation

model (Intercept) cohortibis chf atrial_fibrillation age
Pois1 -0.878 -1.12 0.395 0.760 NA
Pois2 -1.533 -1.14 0.390 0.687 0.0093
NB1 -0.883 -1.06 0.421 0.753 NA
NB -1.329 -1.08 0.412 0.695 0.0064

Poisson and negative binomial; no zero inflation - long format with p-values

term estimate pval model
(Intercept) -0.8781 0.0000 Pois1
cohortibis -1.1217 0.0000 Pois1
chf 0.3946 0.0003 Pois1
atrial_fibrillation 0.7603 0.0000 Pois1
(Intercept) -1.5330 0.0000 Pois2
cohortibis -1.1447 0.0000 Pois2
age 0.0093 0.0159 Pois2
chf 0.3895 0.0004 Pois2
atrial_fibrillation 0.6868 0.0000 Pois2
(Intercept) -0.8831 0.0000 NB1
cohortibis -1.0646 0.0001 NB1
chf 0.4207 0.0324 NB1
atrial_fibrillation 0.7527 0.0000 NB1
(Intercept) -1.3289 0.0016 NB
cohortibis -1.0830 0.0001 NB
age 0.0064 0.2794 NB
chf 0.4122 0.0359 NB
atrial_fibrillation 0.6946 0.0000 NB

Zero inflated Poisson and negative binomial.

term component estimate p.value model
(Intercept) count 0.2615 0.0013 ZIP1
cohortibis count -1.0836 0.0000 ZIP1
chf count 0.1673 0.1680 ZIP1
atrial_fibrillation count 0.3072 0.0030 ZIP1
(Intercept) zero 0.5336 0.0000 ZIP1
(Intercept) count -0.8830 0.0000 ZINB1
cohortibis count -1.0647 0.0001 ZINB1
chf count 0.4207 0.0314 ZINB1
atrial_fibrillation count 0.7527 0.0000 ZINB1
Log(theta) count -0.9474 0.0000 ZINB1
(Intercept) zero -9.8505 0.8728 ZINB1
(Intercept) count 0.4418 0.2044 ZIP2
cohortibis count -1.0793 0.0000 ZIP2
age count -0.0025 0.5945 ZIP2
chf count 0.1697 0.1623 ZIP2
atrial_fibrillation count 0.3238 0.0027 ZIP2
(Intercept) zero 0.5405 0.0000 ZIP2
(Intercept) count -1.3288 0.0011 ZINB2
cohortibis count -1.0830 0.0001 ZINB2
age count 0.0064 0.2659 ZINB2
chf count 0.4122 0.0347 ZINB2
atrial_fibrillation count 0.6947 0.0000 ZINB2
Log(theta) count -0.9405 0.0000 ZINB2
(Intercept) zero -10.0586 0.8747 ZINB2
(Intercept) count 1.3113 0.0001 ZIP3
cohortibis count -1.1633 0.0000 ZIP3
age count -0.0145 0.0019 ZIP3
chf count 0.1838 0.1292 ZIP3
atrial_fibrillation count 0.3695 0.0006 ZIP3
(Intercept) zero 3.8854 0.0000 ZIP3
age zero -0.0465 0.0000 ZIP3
(Intercept) count 0.5769 0.2944 ZINB3
cohortibis count -1.1460 0.0000 ZINB3
age count -0.0136 0.0463 ZINB3
chf count 0.3708 0.0424 ZINB3
atrial_fibrillation count 0.6520 0.0000 ZINB3
Log(theta) count -0.2263 0.4097 ZINB3
(Intercept) zero 4.0004 0.0000 ZINB3
age zero -0.0655 0.0000 ZINB3

Fit

We can use; eg, chi-square test on the log likelihood to check whether the difference in models is statistically significant. See the above tables to infer the covariates and type of model for each model name.

model logLik df AIC BIC
Pois1 -1240 4 2487 2507
Pois2 -1237 5 2483 2509
NB1 -1059 5 2127 2153
NB2 -1058 6 2128 2158
ZIP1 -1106 5 2222 2248
ZINB1 -1059 6 2129 2160
ZIP2 -1106 6 2224 2254
ZINB2 -1058 7 2130 2165
ZIP3 -1087 7 2188 2223
ZINB3 -1049 8 2115 2155

Bayesian zero inflated Poisson regression

Bayesian models often offer some advantages. We get an entire distribution for every parameter modeled-not just point estimates and confidence intervals. We can also get distributions for any statistic.

We first consider a zero inflated model with intercept-only zero inflation portion, for quick comparison with the analogous frequentist model. We see that results are consistent, and in particular the credible intervals for the coefficients are bounded away from zero.

Bayes intercept only zero inflation, no age covariate

\[ \begin{aligned} \textrm{counts}_i & \sim ZIP(\pi_i, \mu_i) \\ \log \mu_i & = \beta_0 + \beta_1 \textrm{cohort}_i + \beta_2\textrm{chf}_i + \beta_3\textrm{atrialfibrillation}_i\\ \log \frac{\pi_i}{1 - \pi_i} & = \gamma_0 \\ \end{aligned} \]

Model results; compare frequentist ZIP model

Bayes

component term Estimate Est.Error Q2.5 Q97.5
count Intercept 0.278 0.079 0.119 0.432
zero zi_Intercept 0.587 0.090 0.407 0.761
count cohortibis -0.867 0.219 -1.302 -0.444
count chf 0.159 0.119 -0.077 0.386
count atrial_fibrillation 0.283 0.100 0.085 0.480

Frequentist

component term Estimate Std. Error
count (Intercept) 0.262 0.081
count cohortibis -1.084 0.254
count chf 0.167 0.121
count atrial_fibrillation 0.307 0.103
zero (Intercept) 0.541 0.094

Posterior distributions

100 draws from the posterior predictive distribution

Post predictive checks

We can check how the distribution of proportions for different values of inpatient_count compares with those in the data.

Bayes zero inflated with age covariate for count and zero inflation portions of the model

\[ \begin{aligned} \textrm{counts}_i & \sim ZIP(\pi_i, \mu_i) \\ \log \mu_i & = \beta_0 + +\beta_1\textrm{cohort}_i + \beta_2\textrm{age}_i + \beta_3\textrm{chf}_i + \beta_4\textrm{atrialfibrillation}_i\\ \log \frac{\pi_i}{1 - \pi_i} & = \gamma_0 + \gamma_1\textrm{age}_i \\ \end{aligned} \] Note that for this model we scale the age covariate

\[ age \rightarrow \frac{\textrm{age} - 60}{10} \]

for interpretability and for numerical stability. This affects the value of the coefficient estimates, but not the model fit, significance, or effect on the outcome.

Posterior estimates, credible intervals

component term Estimate Est.Error Q2.5 Q97.5
count Intercept 0.442 0.087 0.268 0.607
zero zi_Intercept 1.079 0.121 0.843 1.318
count cohortibis -0.929 0.218 -1.367 -0.503
count age -0.126 0.046 -0.214 -0.034
count chf 0.170 0.120 -0.069 0.400
count atrial_fibrillation 0.335 0.106 0.128 0.542
zero zi_age -0.398 0.072 -0.541 -0.259

100 draws from the posterior predictive distribution

Post predictive checks

We can check how the distribution of proportions for different values of inpatient_count compares with those in the data.

Math notes

Linear models

Continuous response \(Y\) (eg, heart rate) as a function of predictors; aka covariates, \(X_j\) (eg, cholesterol, smoker)

\[Y \sim N(\mu , \sigma^2)\] where the conditional mean is a linear function of the predictors

\[\mu = E(Y |X ) = \beta_0 + \sum_{j=1}^p \beta_j X_j = X \boldsymbol{\beta}\]

  • \(\beta_j\) are estimated by minimizing the sum of squares errors, \[\sum_i (y_i - (\beta_0 - \sum_j \beta_j x_{ij}))^2 = ||\mathbf{y} - \mathbf{X}\boldsymbol{\beta}||^2_2\]

  • We can interpret the coefficients directly as effect sizes.

Generalized linear models

We model a function of the conditional mean as a linear function

\[g(\mu ) = \beta_0 + \sum_{j=1}^p \beta_j X_j = X \boldsymbol{\beta}\]

\(g\) is called a link function.

Example: Logistic regression

\[Y \sim \textrm{Bernoulli}(\mu )\] Coin toss with probability \(\mu\) of heads, \(1 - \mu\) of tails.

\[g(\mu ) = \rm{logit} \ \mu := \log \left(\frac{\mu}{1 - \mu}\right) = X \boldsymbol{\beta}\]

\[\mu = \rm{logit}^{-1}(X \boldsymbol{\beta})\]

Likelihood

To find \(\beta_j\), we typically maximize the (log) likelihood function.

For logistic regression, with \(n\) independent observations \(y_i =\) 0 or 1, the joint probability is

\[p({\bf{y}} | X_i) = \mu_i^{y_i} (1 - \mu_i)^{1 - y_i}\]

The log likelihood is then

\[logLik(\boldsymbol{\beta}) = \sum_{i=1}^n y_i \log(\mu_i) + (1 - y_i)\log (1 - \mu_i)\]

Where are the \(\beta\)’s?

Recall that \(\mu_i = \rm{logit}^{-1}(X_i \boldsymbol{\beta})\))

Poisson regression

Counts \(Y\) modeled as

\[ p(Y = y; \mu) = \frac{\mu^y e^{-\mu }}{y!} \] with “canonical” link

\[ g(\mu ) = \log(\mu ) = X \boldsymbol{\beta} \]

Likelihood

\[logLik(\boldsymbol{\boldsymbol{\beta}}) = \sum_{i=1}^n \left( y_i \log \mu_i - \mu_i - \log (y_i!) \right)\]

Fun fact

For a linear model, the minimum sum of squares estimate is the same as the maximum likelihood estimate.

Models

  • Zero inflated Poisson regression

  • Zero inflated negative binomial regression

  • Bayes versions of the above

Negative binomial distribution

Often used on overdispersed data: \(\mu < \sigma^2\).

\[p(Y = y; \mu, \theta) = \binom{y + \theta - 1}{\theta} \left(\frac{\mu}{\mu + \theta}\right)^y \left(\frac{\theta}{\mu + \theta}\right)^\theta\]

  • Dispersion parameter \(\theta > 0\); fixed- does not vary with covariates like \(\mu\).

  • \(\theta\) is estimated by the model

Conditional mean \(E(Y|X) = \mu\). We again model with log link

\[\log \mu = X \boldsymbol{\beta}\]

Variance

\[Var(Y|X) = \mu + \mu^2/\theta\]

\(\theta\) part of the model fit - unlike \(\sigma^2\) in linear regression.

Zero inflated Poisson regression

Define \(Z \sim Bernoulli(\pi)\) - a coin flip with probability \(\pi\) of heads.

\[ Y = \begin{cases} 0 & \text{if} \ Z = 1 \\ \sim Pois(\mu) & \text{if} \ Z = 0 \end{cases} \]

  • \(Z\) models the zero-inflated portion - the “extra” zeros.

  • The additional count data, including zeros, is modeled with the Poisson distribution with mean \(\mu\).

\[ p(Y = y; \mu, \pi) = \begin{cases} \pi + (1 - \pi) e^{-\mu} & \text{if } y = 0 \\ (1 - \pi) \frac{\mu^y e^{-\mu}}{y!} & \text{if } y > 0 \end{cases} \]

(Recall Poisson distribution is \(P(y; \mu) = \mu^y e^{-\mu}/y!\))

  • \(\log \frac{\pi}{1 -\pi} =\) linear function of some covariates

  • \(\log(\mu) =\) linear function of some covariates

Likelihood function

We again estimate the coefficients using log likelihood.

\[ \begin{align*} logLik(\mathbf{\beta}, \mathbf{\gamma}) = \sum_{i=1}^n \Big(& \mathbf{1}_{\{y_i = 0\}} \cdot \log \left( \pi_i + (1 - \pi_i) e^{-\mu_i} \right) \\ & + \mathbf{1}_{\{y_i > 0\}} \cdot \left( \log(1 - \pi_i) + y_i \log(\mu_i) - \mu_i - \log(y_i!) \right) \Big) \end{align*} \]

Bayesian statistical modeling

Starts with Bayes’ Rule

The conditional probability of \(B_k\) given \(A\) is

\[\begin{aligned} P(B_k \ | \ A) & = \frac{P(A \ | \ B_k) P(B_k)}{P(A)} \\ & = \frac{P(A \ | \ B_k) P(B_k)}{\sum_i P(A \ | \ B_i) P(B_i)} \end{aligned}\]

where \(A\) and \(B_j\) are events with \(B_j\) a partition of the sample space. The vertical bar indicates conditional probability

The standard disease/positive test example

\[P(D | +) = \frac{P(+ | D) \ P(D)}{P(+ | D) P(D) + P(+ | \bar{D}) P(\bar{D})} \]

Bayes’ Rule for probability densities for model parameters

\[p(\boldsymbol{\theta} | y, X) = \frac{p(\boldsymbol{\theta}) p(y | \boldsymbol{\theta}, X)}{\int p(y | \boldsymbol{\theta}, X) p(\boldsymbol{\theta}) d\boldsymbol{\theta}}\]

  • random vector of model parameters \(\boldsymbol{\theta}\) (our \(\beta\)’s and \(\gamma\)’s)

  • fixed predictor data \(X\) and response data \(y\)

  • \(p(\boldsymbol{\theta})\) is the prior distribution for the parameters

  • \(p(y | \boldsymbol{\theta}, X)\) is the likelihood of the data given the parameters \(\theta\) - same as the likelihood function in frequentist statistics.

  • \(p(\theta | x)\) is the posterior distribution for the parameters \(\theta\).

The denominator is an integration over the parameter space; typically intractable \(\Rightarrow\) Markov Chain Monte Carlo (MCMC) methods to approximate the posterior distribution.

Posterior predictive distribution

Once you have the \(\theta\) posterior, you can compute the posterior predictive distributions of a new observation \(x_0\), given the data \(X\).

\[\begin{align*} p(x_0|X) & = \int p(x_0, \theta | X)\, d\theta \\ \\ & = \int p(x_0| \theta, X) \, p(\theta |X) , d\theta \\ \\ & = \int p(x_0|\theta) \, p(\theta | X) \, d\theta\\ \end{align*}\]

Can also get a distribution for any statistic of interest.

Bayesian inference

  • Parameters are regarded as random variables, data fixed

  • We get an entire distribution for every parameter, and hence for any statistic we wish to compute

  • No p-values, no hypothesis testing!

Example

  • \(\pi\) is the probability of heads in a single coin toss,

  • data is the sequence of heads and tails in 100 tosses

By contrast, our maximum likelihood estimate of \(\pi\) would be the peak of the likelihood, which is the same as the posterior for flat prior.